Using Vision to Improve Sound Source Separation

نویسندگان

  • Yukiko Nakagawa
  • Hiroshi G. Okuno
  • Hiroaki Kitano
چکیده

We present a method of improving sound source separation using vision. The sound source separation is an essential function to accomplish auditory scene understanding by separating stream of sounds generated from multiple sound sources. By separating a stream of sounds, recognition process, such as speech recognition, can simply work on a single stream, not mixed sound of several speakers. The performance is known to be improved by using stereo/binaural microphone and microphone array which provides spatial information for separation. However, these methods still have more than 20 degree of positional ambiguities. In this paper, we further added visual information to provide more specific and accurate position information. As a result, separation capability was drastically improved. In addition, we found that the use of approximate direction information drastically improve object tracking accuracy of a simple vision system, which in turn improves performance of the auditory system. We claim that the integration of vision and auditory inputs improves performance of tasks in each perception, such as sound source separation and object tracking, by bootstrapping.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporating Visual Information into Sound Source Separation

We present a method of improving sound source separation using vision. The sound source separation is an essential function to accomplish auditory scene understanding by separating a stream of sounds generated from multiple sound sources. By separating a stream of sounds, recognition process, such as speech recognition, can simply work on a single stream, not mixed sound of several speakers. Th...

متن کامل

Robotic Sound Source Separation using Independent Vector Analysis

Beside haptic and vision, mobile robotic platforms are equipped with audition in order to autonomously navigate and interact with their environment. Speaker and speech recognition as well as the recognition of different kind of sounds are vital tasks for human robot interaction. In situations where more than one sound source is active, the mixture has to be separated before being passed to the ...

متن کامل

Interactive User-Feedback for Sound Source Separation

Copyright is held by the author/owner(s). IUI’13, March 19–22, 2012, Santa Monica, California, USA. This work was performed while interning at Adobe Research. Abstract Machine learning techniques used for single-channel sound source separation currently offer no mechanism for user-feedback to improve upon poor results and typically require isolated training data to perform separation. To overco...

متن کامل

Blind Separation of Real World Audio Signals Using Overdetermined Mixtures

We discuss the advantages of using overdetermined mixtures to improve upon blind source separation algorithms that are designed to extract sound sources from acoustic mixtures. A study of the nature of room impulse responses helps us choose an adaptive lter architecture. We use ideal inverses of acquired room impulse responses to compare the eeectiveness of diierent-sized separating lter conngu...

متن کامل

Sound Source Separation: Preprocessing for Hearing Aids and Structured Audio Coding

In this paper we consider the problem of separating different sound sources in multichannel audio signals. Different approaches to the problem of Blind Source Separation (BSS), e.g. the Independent Component Analysis (ICA) originally proposed by Herault and Jutten, and extensions to this including delays, work fine for artificially mixed signals. However the quality of the separated signals is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999